BLOP -- Fitting

The fitting algorithm
Specifying further fitting options
Specifying the chi2 function for minimization
Specifying the datapoints to be fitted
Setting the max number of iterations
Limiting the range of fitting, or imposing any other conditions on data points to be used for fitting
Setting verbose level
Specifying the convergence criterium
Fixing parameters
Obtaining results from the fit
Setting default values for the fit options

The fitting procedure is based on the Levenberg-Marquardt algorithm (as described for example in Numerical Recipes. The reported covariance matrix is the inverse of the hessian (second derivative matrix) of the chi2, with respect to the model's parameters.

Fitting currently does not work in the interpreted environment, only in a compiled code. Solution is already in progress.

Specifying options

The third argument to fit is a variable of type fitopt, which follows the main guidline of option classes. It is used to store any further specifications to the fitting procedure (see below)

What is chi2?

The usual chi2-fitting is the procedure which optimizes the chi2 defined as

chi2=sum_i [y_i - f(x_i)]^2/sigma_i^2

This procedure is a maximum likelihood method for such measurements, where the y values are normally distributed around their expectation value with a known sigma. However, many times this is not the case. For example when fitting a histogram, where the number of entries in a bin are distributed according to a multinomial or Poisson distribution. In this case the 'usual' chi2 fitting method is not a maximum-likelihood method. However, in many of these cases chi2 can be defined in such a way, that the minimization of it gives a maximum likelihood estimation for the parameters (see Nuclear Instruments and Methods in Physics Research 221 (1984) 437-442). Blop currently implements the chi2 defined for Poisson- and multinomially distributed values as well. The user can specify in a template argument of the fit function, which chi2 should be used:

 
fit<gauss_chi2>(....);      // normally distributed data, sigma_y is used
fit<poisson_chi2>(..);       
fit<multinomial_chi2>(...); // poisson- and multinomial-chi2, only the x,y values
                            // are used, sigma_y and sigma_x have no meaning

My compiler (g++ 3.3) does not support default template argument for this function, so in the compiled code one always has to write the chi2 explicitely within the <..>. However, in an interactive session this defaults to gauss_chi2, so if this is the desired chi2 function to be minimized, one can omit this parameter, and simply say:

fit(gr, model)

Beyond the above mentioned 3 chi2 functions the user can define an arbitrary chi2 function to be used in the fit. The user has to specify the per-data-point chi2 function (as a function of the datapoint-pair [and possibly their errors] x,y,sigma_x,sigma_y and the modelfunction's value func). The overall chi2 function to be minimized will be the sum of the per-data-point values for all measured points. This is done by first defining a class with the following 3 static member functions:

class my_chi2
{
  public:
    static double der0(int nx,double *x,double *sigma_x,int ny,double *y,double *sigma_y,double *func);
    static double der1(int nx,double *x,double *sigma_x,int ny,double *y,double *sigma_y,double *func,double *grad1);
    static double der2(int nx,double *x,double *sigma_x,int ny,double *y,double *sigma_y,double *func,double *grad1,double *grad2);
};

The meaning of the member functions is the following (see for example the definition of gauss_chi2 in fit.cc to have an example)

der0 calculates the per-data-point chi2 function's value for one (x,y) pair with errors sigma_x and sigma_y. (They all can be multi-valued, this is why they are provided in the form of an array with number of elements nx and ny, respectively). The array func contains the model function's value(s) at x.
For example, in the case of gauss_chi2, der0 returns
```
chi2(x,y,sigma_x,sigma_y,func) = \sum_{i=0}^{ny-1} (y[i]-func[i])^2/sigma_y[i]^2
```
der1 has to calculate the product of the per-data-point chi2 function's first derivative with grad1 (which contains the modelfunction's derivative with repect to one of its parameters).
For example, in the case of gauss_chi2, der1 returns
```
\frac{\partial}{\partial func} chi2(x,y,sigma_x,sigma_y,func)*grad1 = 2 * \sum_{i=0}^{ny-1} (func[i]-y[i])*grad1[i]/sigma_y[i]^2
```
der2 has to calculate the quadratic product of the per-data-point chi2 function's second drivative with grad1 and grad2 (which contain the modelfunction's derivatives with respect to the parameters).
For example, in the case of gauss_chi2, der2 returns
```
grad1 * \frac{\partial^2}{\partial func^2}chi2(x,y,sigma_x,sigma_y,func) * grad2 = \sum_{i=0}^{ny-1} grad1[i]*grad2[i]/sigma_y[i]^2
```

Once this is done, the class my_chi2 can be used as the template argument to the fit function:

fit<my_chi2>(some_graph, some_model, ..);

Data to fit

The data to be fitted can be stored either in a dgraph or read from a file. In both cases it is specified as the first argument of the fit routine (a dgraph in the first case, or a string, var or char* in the second case). A 'file' is opened by the openin function (which handles special filenames to be interpreted as pipes, here-documents, remote files, etc).

Specification of columns

You can fit a model to a set of data (either stored in a dgraph, or read from a file). The dgraph is only a storage for data points, it does not specify, which column is to be interpreted as x, y or sigma values. (The graph's drawstyle does that). In the case of data read from a file, you have also no rule, which columns should be interpreted as x, y, etc. Therefore, you have to specify, which columns of the dgraph/file should be interpreted as x, y, sigma_x or sigma_y values. To specify this (for example, columns 1,2 should be x, column 3 should be y value for the fit [a 2D surface], with constant 1.0 errors assumed):

fit<gauss_chi2>(g, model, fitopt().x(_1,_2).y(_3).sigma_x(1.0,1.0).sigma_y(1.0));

Pay attention to have the same number of components for x and sigma_x, and also for y and sigma_y

Limit on iteration steps

To limit the number of iteration steps, call the maxsteps(int) function of fitopt. Giving an argument of 0 will limit the number of iterations to 50 times the number of fitted parameters:

fit<gauss_chi2>(mygraph, mymodel,
fitopt().maxsteps(100) );

Limiting the range of fitting, or any conditions on data points

One often wants to restrict the fitting range, or use only a certain set of data points in the fit, which satisfy some conditions.

To apply a general condition on the original data point (i.e. before deducing the x- and y-values from this point), one can set the condition property of the fitopt to a function, which will be evaluated on the original data point. The point is only used if the condition evaluates to non-zero. The code example below fits only those y-vs-x points (stored in the 1^st and 2^nd columns in the datafile), where the value in the 3^rd column is positive:

function model=PAR(1)*_1+PAR(2);
fit<gauss_chi2>("datafile.dat",model,fitopt().x(_1).y(_2).condition(_3>0));

The general condition described above is the most general way of imposing conditions. One can impose conditions also separately, on the derived x- and y-values. In the example below, _2>0 and fy are two functions, which will be called on each x and y point, respectively. Those points pairs (x,y) , where both of them return non-0, will be used in the fit, the others not:

fit<gauss_chi2>(gr, model, fitopt().x(_4,_5).x_condition(_2>0).y_condition(fy);

Note and remember that these conditions are not evaluated on the original data point, but on the x- and y-values respectively. Therefore, the condition imposed on the x-value means 'use only those points, where the second component of the x-value is larger than 0'. Since in this example the x-value was taken to be the values in the 4^th and 5^th columns, this condition is equivalent to the general condition .condition(_5>0). (If you do know what is _1, or do not understand this, go to the functions' description)

To limit the x-range of fitting, one has the following possibilities (similar functions exist for the y values as well):

fit<gauss_chi2>(gr, model, fitopt().xrange(0,10));
fit<gauss_chi2>(gr, model, fitopt().xmin(0));
fit<gauss_chi2>(gr, model, fitopt().xmax(10));

In this case, only those point-pairs (x,y) will be used in the fit, where the x value is between 0 and 10 (or are larger than 0 in the second case, or are less than 10 in the 3rd case). Or, to be more precise: where the 1st component of the x point (x and y can be multi-valued) satisfies the above conditions.

These range-conditions are implemented using the x_condition, i.e. .xrange(0,10) is equivalent to .x_condition((0<_1) && (_1<10)). This means that if you set xrange first, than impose x_condition, it will remove(override) the xrange-condition.

Verbosity

To see some information after (or during) the fitting procedure, set the verbose level in the fitopt class (3rd parameter of the fit function): fitopt().verbose(2) for example (the bigger number you write, the more printout you see). Verbose level 1 will print a summary at the end of the fit (fitted parameters, covariance matrix, chi2). This is the default. Example:

fit<gauss_chi2>(mygraph,    mymodel,    fitopt().verbose(3)
);

Convergence

The convergence criteria can also be specified through the 3rd argument (fitopt) of the fit function. In order to do this, you have to write a C-function of the type

int conv_criteria(double old_chi2, double new_chi2,
                  const vector<var> &old_pars, const vector<var> &new_pars, 
                  int iteration_step_number);

and specify this function to be the convergence criteria function:

fit<..>(Graph, model, fitopt().convergence(conv_criteria) );

(Do not forget to declare your function having a const vector<var> & argument, and not simply vector<var>) Now, your conv_criteria function will be called after each iteration step, and if it returns 0, it will mean that the fit has not yet converged. A non-0 return value indicates convergence, and the iteration stops. The arguments, with which this function will be called, are self-explanatory from the naming above. If it is not clear, don't hesitate to ask me. The default convergence criterium is to stop the iteration if chi2 becomes 0, or if it has decreased by less then 0.01% of its value.

Fixing parameters

You can fix or release parameters by calling

fitopt::fix(int  parindex,  bool fixit=true)

(Setting the second argument to false will release that parameter). For example, to fit with the 1st parameter being fixed, say:

fit<gauss_chi2>(gr, model, fitopt().fix(1) );

The bool fitopt::fixed(int i) function returns true, if the ith parameter was fixed. The fitopt::fixed() function returns a reference to a vector<int>, which contains the indices of the fixed parameters.

Obtaining results from the fit

The fitted values of the parameters are propagated to the function, so you can obtain them from the function itself. Other interesting results are returned in the fitresult class:

class fitresult
{
  public:
    double chi2;          // the chi2 of the fit

    matrix<double> covar; // the covariancia matrix (1-based indices !!!)

    int nparams;          // number of fitted (free) parameters
    int N;                // number of degrees of freedom (ndata - nfreeparams)

    int nsteps;           // number of iteration steps carried out
};

Example for its usage:

fitresult fr = fit<gauss_chi2>(gr, model, ...);
cerr<<"error of the 1st parameter: "<<sqrt(fr.covar(1,1))<<endl;

Default values:

According to the general guidelines, the default value of each of these properties (x,y,verbose, etc in the fitopt class) can be set via the static fitopt::default_xxx(...) function, where the function xxx(...) would set the given property. For example, if you want to do a lot of fitting, in every case using the graph's 1st column as x, and 2nd and 3rd columns as y values for the fit, you should say:

 
fitopt::default_x(_1); 
fitopt::default_y(_2,_3);

and now you can simply say for a 3-column graph, avoiding the last option (or only specifying other properties, like verbose below):

 
fit<gauss_chi2>(g,model,fitopt().verbose(1) );